System and Methods for Improved Linguistic Pattern Matching

ABSTRACT

A system and method for reducing the number of false negatives with minimal impact on false positive search results while allowing searches to return phonetic equivalents, misspellings, common short names, and other such applicable information. New, N-Gram based indexing and search systems and methods which facilitate searching of data containing non-Arabic letters, such as numbers, symbols, and foreign language characters. Ability to customize indexing and other features further enhance search results. Linguistic pattern matching search results are improved based on dynamically modified search attributes.

The present application is a continuation of and claims priority to U.S.patent application Ser. No. 09/957,465 filed on Sep. 21, 2001 whichclaims priority to U.S. Provisional Patent Application No. 60/234,215filed on Sep. 21, 2000 both of which are incorporated herein byreference in their entireties.

This application includes material which is subject to copyrightprotection. The copyright owner has no objection to the facsimilereproduction by anyone of the patent disclosure, as it appears in thePatent and Trademark Office files or records, but otherwise reserves allcopyright rights whatsoever.

FIELD OF THE INVENTION

The present invention relates to the field of linguistic patternmatching for database retrieval purposes.

BACKGROUND OF THE INVENTION

Typographical errors, phonetic misspellings, abbreviations, commonshort-names, and sequence variation are but a few of the problems facingsearchers of computerized records. For example, when calling directoryassistance, if a request is made for the telephone number of Thomas Lee,without spelling either name, a telephone operator may search for TomasLeigh, Thomas Lea, or any of several combinations thereof. In addition,when Thomas Lee was entered into the database, he may have accidentallybeen entered as Lee Thomas, and either or both of his names may havebeen misspelled.

The task of properly searching a computerized database becomes even morecomplex when names comprised of foreign characters are used. Examples ofsuch databases include those containing genealogical records, foreigncity names, foreign names, or company names.

To overcome these problems, some in the prior art have createdtechniques involving character manipulation. Soundex, which is one ofthe most widely used of these techniques, is a simple process ofassociating certain letters with numbers, and dropping other letters. Asearch is performed on the result, and that search may yield names thatsound like or otherwise approximate the name in question.

Others in the prior art have described schemes through which result setsmay be generated based on manipulation of an input word. One suchtechnique, disclosed by U.S. Pat. No. 4,833,610 by Antonio Zamora, et.al., separates and alphabetizes the consonants and vowels of a givenword, and compares a transformed input string to transformed databaseentries. Another technique, disclosed by U.S. Pat. No. 5,737,723 byMichael Dennis Riley et. al., compares dictionary words based on thephonetic confusability of the words. Still another method, disclosed byU.S. Pat. No. 5,724,597 by Robert John Cuthbertson et. al., involvessuccessively applying Soundex and other techniques and generating amatch list based on the results.

While Soundex and other such schemes may allow the reporting of “nearmatches,” the number of false positives reported by these schemes canprohibit their use in large databases. For example, in a database of1000 names, if the Soundex routine had a false positive rate of 0.005,only two false names would be returned. However, when the database growsto 100,000 names, over two hundred false positives are reported.

SUMMARY OF THE INVENTION

The present invention improves upon the prior art by reducing the numberof false positive search results while allowing searches to returnphonetic equivalents, misspellings, common short names, and other suchapplicable information. In addition, the present invention allows thesearching of names containing non-Arabic letters, such as numbers,symbols, and foreign language characters.

The present invention further provides an improved linguistic search byapplying not only those techniques used in the prior art, such asSoundex, but also new techniques. One such new technique is an N-Gramsearch. An N-Gram is a subsequence of N characters from the full wordwhich is used for indexing. For example, the word “example” has thefollowing N-Gram's, where N=3: “exa”, “xam”, “amp”, “mpl”, and “ple”.

In a preferred embodiment, the present invention may be customized toenhance search results by allowing users to tune the present inventionbased on user data. By way of example, without intending to limit thepresent invention, users may select the number of characters to becontained in each N-Gram, and users may also select between types ofN-Grams used by the present invention. N-Gram types available mayinclude, but are not limited to, Alphabetic, Consonant, FDI, FML, andNumeric.

Customized search results may be further refined by evaluating matchesbased on user preferences. One method employed by the present inventionto evaluate matches is the Edit Distance method. The Edit Distancemethod calculates the number of characters which must be inserted,deleted, changed, or transposed in one word to obtain a match withanother word, and accepts or rejects results based on the Edit Distance.For example, the edit distance between “Michael” and “Mikhail” is 2(replace “c” by “k” and “e” by “i”).

In addition to a user-customized configuration, the present inventionmay also be configured to dynamically select configuration options.Dynamic selection allows the present invention to modify sensitivitylevels based on search string attributes, such as, but not limited to,search string length. Search sets resulting from such dynamic selectionsmay also be refined as described in the previous paragraph.

The present invention overcomes limitations and problems withconventional linguistic pattern matching system by providing a systemand methods for comparing a query against data contained within adatabase comprising the steps of: (a) receiving the query; (b)converting a plurality of information from the query, by at least onelinguistic pattern analytical tool, into a plurality of linguisticpattern strings; (c) matching at least one of the plurality oflinguistic pattern strings with at least one stored linguistic patternstring contained within a database; (d) repeating steps (b) and (c)above for each of the at least one linguistic pattern analytical tool;and (e) combining the matches of each of the at least one linguisticpattern analytical tool providing a combined result.

The present invention also provides a system and method for comparingquery information about a party against a plurality of restrictedparties information contained within a database comprising the steps of:(a) receiving query information about a party; (b) converting theplurality of information about the party into a plurality of partylinguistic pattern strings by at least one linguistic pattern analyticaltool; (c) matching at least one of the plurality of party linguisticpattern strings with at least one stored linguistic pattern string saidplurality of restricted parties information contained within saiddatabase; (d) repeating steps (b) and (c) above for each one of the atleast one linguistic pattern analytical tool; and (e) combining thelinguistic pattern matches of each one of the at least one linguisticpattern analytical tool providing a combined result.

The present invention also provides a method for comparing a queryagainst data contained within a database comprising the steps of (a)receiving a query containing a plurality of information; (b) convertingthe plurality of information by an Alphabetic N-gram based linguisticpattern analytical tool into a plurality of Alphabetic linguisticpattern strings; (c) matching at least one of the plurality ofAlphabetic linguistic pattern strings with at least one storedlinguistic pattern string contained within the database to provide aplurality of Alphabetic matches; (d) converting the plurality ofinformation by a Consonant N-gram based linguistic pattern analyticaltool into a plurality of Consonant linguistic pattern strings; (e)matching at least one of the plurality of Consonant linguistic patternstrings with the at least one stored linguistic pattern string containedwithin the database to provide plurality of Consonant matches; (f)converting the plurality of information by a Numeric N-gram basedlinguistic pattern analytical tool into a plurality of Numericlinguistic pattern strings; (g) matching at least one of the pluralityof Alphabetic linguistic pattern strings with the at least one storedlinguistic pattern string contained within the database to provide aplurality of Numeric matches; (h) converting the plurality ofinformation by an Fdi N-gram based linguistic pattern analytical tool,into a plurality of Fdi linguistic pattern strings; (g) matching atleast one of the plurality of Fdi Alphabetic linguistic pattern stringswith the at least one stored linguistic pattern string contained withinthe database to provide a plurality of Fdi matches; (h) converting theplurality of information by an Fml N-gram based linguistic patternanalytical tool into a plurality of Fml linguistic pattern strings; (i)matching at least one of the plurality of Fml linguistic pattern stringswith the at least one stored linguistic pattern string contained withinthe database to provide a plurality of Fml matches; and (j) combiningthe Alphabetic matches, Consonant matches, Numeric matches, Fdi matches,and Fml matches to provide a combined result.

BRIEF DESCRIPTION OF THE DRAWINGS

For a further understanding of the nature, objects, and advantages ofthe present invention, reference should be had to the following detaileddescription, read in conjunction with the following drawings, whereinlike reference numerals denote like elements and wherein:

FIG. 1 is a schematic diagram showing an Internet based system for userinteraction with a preferred embodiment of the present invention.

FIG. 2 is a schematic diagram illustrating the flow or steps oflinguistic pattern matching of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The following is a functional description of the present invention froman administrative perspective. These functions include starting thepresent invention in various modes, stopping the present invention, andtuning different parameters that control how the present inventionperforms a linguistic pattern match.

In a preferred embodiment of the present invention the user 101utilizing computer 103 could use the system for restricted partyscreening. As seen in FIG. 1, the user 101 could be involved inexporting and shipping goods to a receiving party 105. In order for theuser 101 to verify that the receiving party 105 is approved forreceiving certain goods the user 101 could screen the receiving party105 against a database 124 containing the names of restricted parties.

Using computer 103, which is connected to the internet 110 viacommunications path 109, the user accesses the restricted party program122 on server 120 for screening against the restricted party database124. A firewall 115 may be incorporated to limit access. The user 101would input information about the query or receiving party 105, in thisexample, into computer 103 which is then transmitted through theinternet 110 to server 120 and utilized by the restricted party program122. As will be described in more detail below the restricted partyprogram 122 breaks down the information about the receiving party 105into various categories and linguistic patterns. The restricted partyprogram 122 then compares the linguistic patterns from the query withthe linguistic patterns and categories of parties contained within therestricted party database 124.

By utilizing the preferred embodiment of the present invention the user101 would be able to determine if the receiving party 105 is containedwithin the restricted party database 124 with a high level of accuracyso that the user 101 could determine whether to move forward with thetransaction and ship goods to the receiving party 105 along path 107.The user 101 can implement the restricted party screening process atdifferent points in the business process. The user 101 could conduct aprescreening of parties they do regular business with, the user 101 canconduct the restricted party screening during the middle of atransaction, or prior to a transaction.

In the prescreening mode the company or the user 101 would have alimited set of business partners or receiving parties 105 which arerelatively static. The user 101 would do an initial restricted partyscreening of all the receiving parties 105 so that the user 101 iscomfortable that these receiving parties are not restricted. This allowsthe user 101 to freely do business with the receiving parties 105without needing to re-screen them each time they engage in a businesstransaction. The present invention through the restricted party program122 is able to monitor information changes about the receiving parties105 and changes in the restricted party database 124 and automaticallyre-screen the receiving parties 105. It is necessary to insure that thereceiving parties 105 who are not restricted initially have not beenadded to a restricted parties list therefore the present inventionperforms an automatic re-screening upon modifications to the receivingparty 105 data or the restricted party database 124. Additionalapplications within the restricted party program 122 allow the presentinvention to monitor changes to both the restricted party database 124,the receiving parties 105, and the users 101 to automatically invoke are-screening when required.

The user 101 may also wish to invoke the restricted party screeningduring a transaction since parties involved within a transaction mayvary on a regular basis. As new business partners are introduced on newtransactions, there is no way to prescreen the receiving parties 105before the transactions have started. In this case, the restricted partyprogram 122 and screening process are invoked after the transaction hasbeen created. The restricted party screening would run as part of thecompliance screenings required on an export transaction. In addition,the preferred embodiment of the present invention would automaticallyperform a re-screening if the transaction is modified. The re-screeningis necessary to insure that the user 101, the receiving party 105, orother parties involved in the transaction do not circumvent thescreening process.

The last method described utilizing the preferred embodiment of thepresent invention would incorporate the restricted party screening priorto a transaction such that the user 101 would not initiate thetransaction until they have confirmed that the receiving party 105 isnot on a restricted party list. An example would be an e-commercebusiness where customers order items or goods over the internet 110. Thee-commerce business could verify that the customer or receiving party105 who desires to order goods is not on a restricted party list withina database 124 prior to allowing the receiving party 105 to actuallyorder items. In this example, the customer or receiving party 105 wouldbe required to enter information via communication path 113. The enteredinformation would be used for verification by the restricted partyprogram 122 that such customer or receiving party 105 is not within arestricted party list contained within the restricted party database124. Once the customer or receiving party 105 has been screened and notfound on any restricted party list the customer of receiving party 105would be allowed to proceed with ordering items which are then sent fromthe user 101 along path 107.

The restricted party screening or database screening process of thepresent invention may be used by itself to determine or assess enteredinformation with that contained within a database or it may be used aspart of a broader system such as a landed cost import and export system.The landed cost import and export system could be used to determinelocations for shipping goods, calculating the costs for shipping goodsto a receiving party, determining taxes or duties for shipping goods toa receiving party, and screening the receiving party against arestricted party list for determination as to whether or not thereceiving party is allowed to receive the requested goods.

As will be described in more detail below, the present invention useslinguistic mapping analytical tools and algorithms to filter and findmatching records or character strings contained within a database. Thelinguistic pattern matching and screening process is described inconjunction with FIG. 2. In step 201 the user initiates or starts thescreening process. The screening process allows the user of the presentinvention to determine whether information about a customer, partner, oraffiliate matches information of parties contained within a database.This may be useful for determining whether or not the customer, partner,affiliate is a restricted party, a pre-registered user, or the like. Theuser would enter the query or third party information either prior tostep 201 or after initiation of the screening process as seen in step202.

The user can also select the various sections and options which apply tothe screening process in step 203. The user selected sections andoptions include choices for displaying various matches including knownmatches, dictionaries, matches, potential matches, and match words whichrelate to the sections. The user may also select which information isused in determining a match and could include name, address, phone, fax,e-mail and other information. After the sections and options for thescreening process are determined the present invention analyzes theinformation from the third party against data contained within adatabase.

In step 205, the present invention extracts a set of attributes from thequery information, one for each potential match area, such as name,address, city, state, zip, etc. The present invention then tokenizes theinformation within each attribute in step 207. The extraction andtokenization of these attributes is performed by unique and specializedcode as will be described in more detail later. An inspection andexamination of each match data record and attribute for unusualcharacteristics about which the user needs to be notified can beperformed, as seen in step 209. The unusual words dictionary, as will bedescribed in more detail below, is used by the present invention forthis purpose.

To find potential matches a pre-screening, step 211, of the extractedand tokenized attributes of the query are matched against tokenizedattributes of the saved records contained within a database. The variouslinguistic mapping tools, clarifiers, and algorithms which may beincorporated in the pre-screening 211 at least include: Metaphone 213,Phonex 215, Soundex 217, Alphabetic N-gram 219, Consonant N-gram 221,Numeric N-gram 223, Fdi N-gram 225, and Fml N-gram 227. The analysis bythe various linguistic pattern matching tools 205-229 may be performedsequentially or simultaneously. In the example shown in FIG. 2 theprocess or screenings are done sequentially. The pre-screening 211 isgenerally very broad in nature attempting to find all possible matchesof all attributes and provides a Broad Subset of Potential AttributeMatches as seen in step 229.

Each potential match within the Broad Subset of Potential Matches 229 isthen analyzed by the detail matching process of step 231. In the detailmatching process 231, for each potential match, the number and type ofword matches may be determined by using Metaphone 233, Phonex 235,Soundex 237, Edit Distance 239, and the various Dictionaries 241 whichare all used to determine how many words match exactly andapproximately. The information in all of the Dictionaries 241 is used inperforming these word matches. This includes the common words, distinctwords, etc. It does not include the unusual words dictionary althoughsuch could be incorporated and still be within the scope of the presentinvention. That information is used by specialized algorithms, unique toeach type of attribute, to determine if a potential match is an actualmatch. For name matching there can also be an additional algorithmicstep to determine if the initials match.

All matches found by the various pre-screening 211 and detail matching231 linguistic pattern matching tools are still categorized by eachattribute (names, address, phone etc. for this example). In step 250 allof the records with individual attribute matches are combined. In step260 the combined set of matches are filtered to remove undesirablematches in accordance with the selections established in step 203. In apreferred embodiment the primary application uses “negative” matches onlocation to filter undesirable matches. The present invention can employvarious methods to include false negative matches and to limit falsepositives. Ultimately, the results are displayed in step 265 and thematching process is terminated in step 270.

The present invention uniquely combines the elements, linguistic patternmatching analytical tools, and results of each analysis. Further, theparticular ways in which the present invention tokenizes words, extractsattributes, and uses the various dictionaries is unique. Further, theparticular dictionaries used, the filtering and negative matching, andthe particular data formats and commands used for socket communicationand for XML socket communication are unique. The manner in which thesevarious features are parameterized is also unique. In a preferredembodiment the focus is on reducing the number of false negatives with aminimal impact possible on the number of false positives.

For installation of the preferred embodiment of the present invention,the operating system resources could include 1 to 3 megabytes of harddisk storage for the program and data files. An allocation of 5 or moremegabytes should provide a safe reserve for installing the presentinvention. If the present invention is to be used as a TCP/IP socketserver, then TCP/IP sockets must be available. If the present inventionis to be used as a SAP server, then SAP must be available, and a versionof the present invention which provides the SAP interface must beinstalled. The SAP interface is currently only available for Windows NT4.0 and HP_UX 10.2. All versions of the present invention support asocket client/server interface.

When executing, the present invention may require internal memory ofapproximately 15 times data file size, plus the size of the executableprogram. In practice, this is around 12 megabytes if only RestrictedParty Matching is to be used, and 20 megabytes if both Restricted PartyMatching and Partner Matching are to be used. In Restricted PartyMatching mode, the present invention attempts to determine if aparticular entity exists in the Restricted Party List. In PartnerMatching, the present invention attempts to determine if a knownRestricted Party exists in company's list of business partners.

To properly install the present invention, several files may berequired. A description of key files follows, as well as a descriptionof many of the functions controlled by those files. Although specificfile names, file locations, function names, parameter names, and thelike are given it is understood that such names and locations may bechanged without departing from the scope or spirit of this application.

Denper is the primary application, or executable file and currentlyranges in size from 300 kB or Windows NT to around 2 MB on Unix.Denper.ini is a configuration file which may affect the operation of theprimary application. Denper.ini contains various options used duringtesting and evaluation, as well as data file names, their associatedconfiguration files, and other such information. Denper.ini and otherconfiguration files described herein may be text files or otheruser-editable file types, and may be organized in a manner similar tostandard Windows .ini files.

As with standard Windows .ini files, blank lines are ignored, leadingwhite space on a line is ignored and everything that follows a # (numbersign) is ignored when denper.ini or other configuration files areprocessed by the present invention. Remaining lines may be treated assection headers or content lines. Section header lines may take theformat [section] or [attribute.section]. Content line format isdependent upon the containing section. Content lines need not be in aspecific order within a section. In addition, except for the [end]section, section order within a configuration file is not significant.However, an [end] section must be the last section in the .ini file, asan [end] section indicates the end of the configuration file.

The denper.ini configuration file may consist of two or more sections,including [options] and [end]. The [options] section contains a seriesof key/value pairs. For options which may be enabled or disabled, thevalue portion of a key/value pair associated with that option may be“true” or “false,” respectively. Such key/value pairs supported by the[options] section include:

displayKnownMatches=true/false

displayDictionaries=true/false

displayMatches=true/false

displayPotentialMatches=true/false

displayMatchWords=true/false

These key/value pairs may be used for testing during algorithmdevelopment. In a preferred embodiment, all values should be set to“false” when the present invention is used in production.

In addition the to true/false options, the present invention may alsosupport customizable options. Such options include:

partnerListTuning=partner.ini

partnerListName=partner.bin

restrictedPartyListTuning=restrict.ini

restrictedPartyListName=restrict.bin

The key/value pairs listed above may specify the partner listconfiguration and data files, and the restricted party listconfiguration and data files. Default values used in a preferredembodiment are shown above; however, other file names and paths may beused.

In a preferred embodiment, if only restricted party matching isrequired, the partner.bin file may be zero bytes in length.Alternatively, if only partner matching is desired, the restrict.binfile may be zero bytes in length.

Partner.ini is a configuration file which affects Partner Matching. Thisfile may be required even when Partner Matching is not used. Partner.inimay be a text file or other user-editable file type. Partner.ini maycontain key/value pairs similar to those of restrict.ini, which areoutlined below.

Partner.bin is a data file used for Partner Matching. This file may berequired even when Partner Matching is not used, but may be zero length.This is a binary file which may not be easily edited by a user.

Restrict.ini is a configuration file which affects Restricted PartyMatching which may include, but is not limited to, the followingsections:

[options]

[name.indexTuning]

[name.matchTuning]

[name.commonWords]

[name.distinctWords]

[name.unusualWords]

[name.synonyms]

[name.wordFragments]

[address.matchTuning]

[address.commonWords]

[address.distinctWords]

[address.unusualWords]

[address.synonyms]

[phone.indexTuning]

[phone matchTuning]

[fax.indexTuning]

[fax.matchTuning]

[E-mail.indexTuning]

[E-mal.matchTuning]

[end]

The [options] section contains a series of key/value pairs, including:

matchName=true/false

matchAddress=true/false

matchPhone=true/false

matchFax=true/false

matchE-mail=true/false

Such key/value pairs may enable or disable matching for each supportedmatching method.

The [name.indexTuning] section contains a series of key/value pairs,including:

indexByMetaphone=true/false

indexByPhonex=true/false

indexBySoundex=true/false

indexByAlphabeticNgram=true/false

indexByConsonantNgram=true/false

indexByNumericNgram=true/false

indexByFdiNgram=true/false

indexByFmlNgram=true/false

These key/value pairs may enable or disable individual indexing typesused by the matching algorithm. As the number of indexes used increases,the likelihood of finding potential word matches also increases.However, as the number of indexes used increases, storage space andprocessing time required by the matching algorithm also increases.

The first three keywords—‘indexByMetaphone’, ‘indexByPhonex’, and‘indexBySoundex’—use phonetic word indexing methods similar to those inthe prior art. These work well for names containing Arabic (i.e.English) characters, but do not work well for non-Arabic based names.

Other keywords may use N-grains for word indexing. An N-gram is asubsequence of N characters from the full word. There are severaldifferent ways to select the set of particular subsequences to be usedfor indexing.

An Alphabetic N-gram 211 chooses all possible contiguous subsequences oflength N where the characters are alphabetic. A consonant N-gram 213chooses all possible contiguous subsequences of length N where thecharacters are consonants, with duplicate successive characters andnon-consonants deleted. A numeric N-gram 215 chooses all possiblecontiguous subsequences of length N where the characters are digits withall non-digits deleted. An fdi N-gram 217 chooses subsequences in whichthe first character of the word is always the first alphabetic characterof the N-gram, and the remaining characters of the N-gram are allpossible alphabetic subsequences of (N−1) contiguous characters from theword. An fml N-gram 219 generates N-grams in which the first and lastalphabetic characters of the N-gram are the same as the first and lastcharacters of the word. The middle characters are all possiblesubsequences of (N−2) contiguous alphabetic characters from the word.The example contained within FIG. 2 shows the indexing techniques orlinguistic patterns for each one of the N-gram linguistic patternanalytical matching tool 211, 213, 215, 217, 219. Additional examples ofthese indexing techniques, where N=3, include:

indexByAlphabeticNgram

helloworld—hel, ell, llo, low, owo, wor, orl, rld

indexByConsonantNgram

helloworld—hlw, lwr, wrl, rld

indexByNumericNgram

6619006—661, 619, 190, 900, 006

indexByFdiNgram

helloworld—hel, hll, hlo, how, hwo, hor, hrl, hld

indexByFmlNgram

helloworld—hed, hid, hod, hwd, hrd (note that duplicate indices arediscarded)

In addition to the indexing techniques described above, the[name.indexTuning] section may also allow a user to customize certainindex properties by changing values in key/value pairs. Examples of suchkey/value pairs include:

metaphoneLength=

phonexLength=#

soundexLength=

alphabeticNgramLength=

consonantNgramLength=

fdiNgramLength=

fmlNgramLength=

numericNgramLength=

Such key/value pairs may set the number of characters used by each ofindex. In a preferred embodiment, values smaller than the default mayresult in more comprehensive indexing, but may also result in too manymatches to be useful. Values larger than the default result in poorerindexing. The default values for the key/value pairs listed above are:

metaphoneLength=4

phonexLength=4

soundexLength=4

alphabeticNgramLength=3

consonantNgramLength=3

fdiNgramLength=3

fmlNgramLength=3

numericNgramLength=3

The [name.matchTuning] section contains a series of key/value pairs,which include:

matchByMetaphone=true/false

matchByPhonex=true/false

matchBySoundex=true/false

matcbByEditDistance=true/false

matchBySet=true/false

These key/value pairs may enable or disable various word matchingtechniques. Generally, the most effective technique is the‘matchByEditDistance’. The edit distance is the number of characterswhich have to be inserted, deleted, changed or transposed in one word toobtain the second word. For example, the edit distance between ‘Michael’and ‘Mikhail’ is 2 (replace ‘c’ by ‘k’ and ‘e’ by ‘i’). Matching by anyof the phonetic based methods (such as Soundex) would fail if the ‘c’and ‘k’ were not considered to be phonetically equivalent. Althoughphonetic based methods may work for names of English origin, non-Englishnames are not well matched by such methods.

The [name.matchTuning] section may also contain key/value pairs whichallow a user to customize word matching thresholds, including:

isWordMatchThresholdDynamic=true/false

wordMatchThreshold=#

If ‘isWordMatchThresholdDynamic’ is true then ‘wordMatchThreshold’ isignored. If ‘isWordMatchThresholdDynamic’ is false, then the valueassociated with ‘wordMatchThreshold’ becomes the minimum number of wordswhich must match for the search and comparison strings to be considereda match. The dynamic option increases the number of required wordmatches as the number of words in a search string increases. The dynamicoption also allows imperfect matches to be considered.

The [name.matchTuning] section may further allow word match sensitivityto be customized through key/value pairs such as:

isWordMatchSensitivityDynamic=true/false

wordMatchSensitivity=#

If ‘isWordMatchSensitivityDynamic’ is true then ‘wordMatchSensitivity’is ignored. If ‘isWordMatchSensitivityDynamic’ is false, then‘wordMatchSensitivity’ is the maximum edit distance between wordsallowed for the words to be considered a ‘match’. Values smaller than 2allow only very minor variations in word spelling for a match, whilevalues larger than 2 generally match too many words to be used. Thedynamic option increases the allowed edit distance based on the lengthof words being matched. Thus, small edit distance thresholds are usedfor short words and larger edit distance thresholds are used for longerwords. The dynamic option generally provides an optimum choice formatching words.

The [name.matchTuning] section may also containwordPrefixDifferencePenalty=#, a key/value pair which specifies apenalty to be added when the first character in each compared worddiffers. This penalty may be desirable because words which arerecognizable variants typically start with the same letter, and becausetypically the first letter of a word is entered correctly, even when theword is misspelled. In a preferred embodiment, this penalty may be setto 3.

The [name.matchTuning] section may also allow set match sensitivity tobe customized through key/value pairs such as:

isSetMatchSensitivityDynamic=true/false

setMatchSensitivity=#

These values are similar to those used for word matching, but may applyto sets of words. When applied to sets, sensitivity may be calculated asan average allowed per word (i.e. as the number of words increases, moredifferences may be allowed).

The [name.commonWords] section contains a common words dictionary, whichlists words occurring so frequently as to be essentially meaningless.Words in this section may be ignored by the matching process. Each linein this section may take the format:

word, word, word, . . . , word

for example:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

a, b, c, d, e, f g, h, i, j, k, l, m

n, o, p, q, r, s, t, u, v, w, x, y, z

and, or, of, the

associates, group

company

The [name.distinctWords] section contains a distinct words dictionaryand lists pairs of words which the matching algorithm normally considersto match, but which client experience has found are not matches. Eachline is this section has the format:

word, word

for example:

america africa

holding holiday

marine machine

sea sky

which have an edit distance, respectively, of 2, 3, 2 and 2. Thematching algorithm may consider these words to be the same, but a closerexamination of these matches would suggest that they are rarelymisspelled.

The [name.unusualWords] section contains an unusual words dictionary andlists words which are sufficiently unique that they cause any matchcandidate that contains an unusual word to match. Each line in thissection has the format:

word

for example:

Saddam

The [name.synonyms] section contains a synonym dictionary, which listswords that are considered to match even though the matching processwould not normally consider the words to match. Each line in thissection has the format:

word, word, . . . , word->word

where each of the words to the left of the arrow will be replaced by theword on the right of the arrow before matching. This section can also beused to correct known misspellings and to handle nicknames are namevariants. For example:

bill->william

dr->doctor

holdings->holdings

industria, industrie->industry

iraqi, iraquiano->iraqu

irrrran, iranische->iran

mike->michael

rob, bob, bobby->robert

oil, petrol->petroleum

The [name.wordFragments] section contains a word fragments dictionary,which is a description of words that are likely to be fragmented. Theselection of these word fragments may be based on client experience,database contents, and other factors. Each line in this section may takeform:

fragment, fragment, . . . , fragment->word

for example:

al-khalij, al-arabi->al-khalij-al-arabi

gold, star->goldstar

import, export->export-import

import, export->import-export

import, export->importexport

These examples show that the same word fragments can be combined indifferent ways, thereby allowing hyphenated names to be matched.Normally, word ordering is not significant in the matching process.However, when a word is hyphenated, word order becomes significant;therefore, it is sometimes desirable to list the hyphenated word in allexpected word orderings.

The [address.matchTuning] section contains a series of key/value pairs,which include:

matchByMetaphone=true/false

matchByPhonex=true/false

matchBySoundex=true/false

marchByEditDistance=true/false

matchBySet=true/false

These key/value pairs may control how addresses are matched. The addressmatching algorithm may function similar to the previously described wordmatching algorithm.

The [address.matchTuning] section also contains key/value pairs whichcontrol match thresholds, including:

isAddressMatchThresholdDynamic=true/false

addressMatchThreshold=#

These key/value pairs set the threshold for the number of words whichmust match for two attributes to match. If‘isAddressMatchThresholdDynamic’ is true then ‘addressMatchThreshold’may be ignored. If ‘isAddressMatchThresholdDynamic’ is false, then‘addressMatchThreshold’ is the minimum number of words which must matchfor the compared words to match. The dynamic option increases the numberof required exact word matches as address length increases, and alsoallows imperfect matches to be considered.

Also included in the [address.matchTuning] section are key/value pairswhich control address match sensitivity when using the Edit Distancecomparison method, such as:

isAddressMatchSensitivityDynamic=true/false

addressMatchSensitivity=#

If ‘isAddressMatchSensitivityDynamic’ is true then‘addressMatchSensitivity’ may be ignored. If‘isAddressMatchSensitivityDynamic’ is false, then‘addressMatchSensitivity’ is the maximum edit distance between addressesallowed for the addresses to be considered a ‘match’. Values smallerthan 2 allow only minor variations in address spelling for a match,while values larger than 2 may match too many addresses to be useful.The dynamic option increases the allowed edit distance based on thelength of addresses being matched. Thus, the dynamic option uses smalledit distance thresholds when comparing short addresses, and larger editdistance thresholds for longer addresses.

The [address.matchTuning] section also contains a key/value pair,addressPrefixDifferencePenalty=#, which is a penalty added when thefirst characters differ between compared words, as with the previouslydescribed wordPrefixDifferencePenalty.

The [address.matchTuning] section may further contain a series ofkey/value pairs controlling match sensitivity, including:

isSetMatchSensitivityDynamic=true/false

setMatchSensitivity=#

These key/value pairs function similar to those for word set matching,but may be applied to address set matching.

The [address.commonWords] section contains a common words dictionarywhich may lists words which frequently occur but, for comparisonpurposes, are essentially meaningless. Words in this section may beignored by the matching process. The [address.commonWords] section maycontain data of a structure similar to [name.commonWords], and may servea similar purpose when evaluating addresses.

The [address.distinctWords] section contains a distinct wordsdictionary, which lists pairs of words that the matching algorithmnormally considers as matching, but which client experience has foundare not matches. The [address.distinctWords] section may contain data ofa structure similar to [name.distinctWords], and may serve a similarpurpose when evaluating addresses.

The [address.unusualWords] section contains an unusual words dictionary,which lists words that are sufficiently unique that any address matchcandidate containing an unusual word should be considered a match. The[address.unusualWords] section may contain data of a structure similarto [name,unusualWords], and may serve a similar purpose when evaluatingaddresses.

The [address.synonyms] section contains a synonym dictionary, whichlists words that are considered to match even though the matchingprocess would not normally consider the words to match. Each line inthis section has the format:

word, word, . . . , word->word

where each of the words to the left of the arrow will be replaced by theword on the right of the arrow before matching. This section can also beused to correct known misspellings and to handle nicknames are namevariants. Examples of such synonyms include:

av, ave->avenue

st->street

first->1st

second->2nd

third->3rd

fourth->4th

fifth->5th

sixth->6th

seventh->7th

w.->west

e.->east

n.->north

s.->south

step->suite

rd->road

dr->drive

no, no.->number

bldg->building

The [phone.indexTuning] section contains a series of key/value pairswhich control phone number matching. Such key/value pairs include:

indexByNumericNgram=true/false

The index is used to find phone numbers during the matching process. AnN-gram is a subsequence of N characters from the full word.

A numeric N-gram chooses all possible contiguous subsequences of lengthN where the characters are digits with all non-digits deleted. By way ofan example, without intending to limit the present invention, the number6619006 may be converted to N-grams of 661, 619, 190, 900, and 006, whenN=3.

The [phone.indexTuning] section also contains numericNgramLength=#, akey/value pair which sets the number of digits used for numeric N-Grams.The default value is 3.

The [phone.matchTuning] section contains a series of key/value pairswhich control matching of telephone numbers, including:

matchByEditDistance=true/false

Generally, the only effective phone number matching technique is‘matchByEditDistance’. The edit distance is the number of digits whichhave to be inserted, deleted, changed or transposed in one phone numberto obtain the second phone number.

The [phone.matchTuning] section also includes key/value pairs such as:

is PhoneMatchSensitivityDynamic=true/false

phoneMatchSensitivity=#

These key/value pairs set the sensitivity for phone number matchingusing the edit distance. If ‘isPhoneMatchSensitivityDynamic’ is false,then ‘phoneMatchSensitvity’ is the maximum edit distance between phonenumbers allowed for the phone numbers to be considered to ‘match’.Values smaller than 2 allow only very minor variations in phone numbersfor a match, while values larger than 2 generally match too many phonenumbers to be useful.

The [fax.indexTuning] section contains a series of key/value pairs whichare similar to those in the [phone.matchTuning] section.

The [E-mail.indexTuning] section contains a series of key/value pairswhich control the indexing of E-mail addresses, including:

indexByAlphabeticNgram=true/false

indexByConsonantNgram=true/false

indexByNumericNgram=true/false

An index may be used to find E-mail addresses during the matchingprocess. As the number of indexing methods is increased, the number ofpotential matches may increase. However, as more E-mail indexing methodsare used, the matching algorithm may require more space and processingtime.

The key/value pairs of the [E-mail.indexTuning] section may use N-Gramsfor E-mail address indexing. N-gram indexing is described in detailabove.

The [E-mail.matchTuning] section may contain a series of key/valuepairs, including:

matchByEditDistance=true/false

isEmailMatchSensitivityDynamic=true/false

emailMatchSensitivity=#

These key/value pairs control E-mail address matching. Generally, theonly effective technique is ‘matchByEditDistance’. The edit distance isthe number of characters which have to be inserted, deleted, changed ortransposed in one E-mail address to obtain the second E-mail address.

The [email.matchTuning] key/value pairs control E-mail address editdistance matching, and set the sensitivity for edit distance E-mailaddress matching. If ‘isEmailMatchSensitivityDynamic’ is false, then‘wordMatchSensitivity’ is the maximum edit distance between fax numbersallowed for the E-mail address to be considered to ‘match’. Valuessmaller than 2 allow only very minor variations in E-mail addresses fora match, while values larger than 2 generally match too many E-mailaddresses to be used.

The [end] section indicates the end of the configuration file. No datafollows this section.

Restrict.bin is a data file used for Restricted Party Matching and maybe used even when Restricted Party Matching is not used, but may be zerolength. This is a binary file not intended to be edited by the user.

In a preferred embodiment of the present invention, the files listedabove and containing the parameters outlined above could be installed inthe same directory. It is further preferred that the present inventionnot share a directory with other applications. When using a socketserver, data files may be reloaded or replaced. Any such replacementscould be placed in the same directory as the other files.

Once installed, the present invention may be executed from a commandline with a command such as “denper-r-h”, which will start the presentinvention as a socket server using default options. Command line optionssupported by the present invention include:

-? Display command line help -a address Address of socket server host,default: “localhost:20787” -a nnnn Socket server port, defaults to 20787-b Basic match -- ignore dictionaries -c options Name of configurationfile, defaults to “the present invention.ini” -d restrictedList Name offile containing restricted party list -f Fixed length fields inrestricted party records -h Host socket interface (only -a, -c, -d, -f,-t and -v options may be used with -h) -i inputFile Name of input filefor batch and update modes -l newList Name of new list -lf newList Nameof new text list -lv newList Name of new binary list -n “name” Name tobe matched -o outputFile Name of output file for batch and update modes-r Match complete record -t Trace -v Variable length fields in binaryrestricted party records -x Client socket interface -P Dump partner list-R Dump restricted party list -U Update restricted party list -X Displayperformance information

The following are sample commands which may be used to start the presentinvention, as well as brief descriptions of some of the functionalitygained by using a command line option. For example, a command of “denper-a nnnn -h”, may start the present invention as a socket serverlistening on port nnnnn.

A further example is the command “denper -a host:nnnn -r -x -imatchData.txt -o results.txt”, which may start the present invention asa socket client in batch mode, where the present invention is running asa socket server on host and listening on port nnnnn. In addition, thepreviously illustrated command line may use complete record matching, asindicated by option -r. Batch mode, indicated by the -r option, allowsthe present invention to read comparison data from the file specified as“matchData.txt;” the match results may be written to the file specifiedas “results.txt.” By default, when the -i and -o options are omitted,the present invention may use “stdin” and “stdout” for input and output,respectively.

A command line of “denper -s” may start the present invention as a SAPserver.

A command line of “denper -r -i matchData.txt -o results.txt” may startthe present invention in batch mode. Complete record matching will beused (this is indicated by the -r option and is normally used whenrunning the present invention in batch mode). Match data may be storedin the file specified as “matchData.txt,” and match results may hewritten to the file specified as “results.txt.” By default, when the -iand -o options are omitted, the present invention uses “stdin” and“stdout” for input and output, respectively.

A command line of “denper -n ‘name to match’” may start the presentinvention and attempt to match a single name. The results may be writtento “stdout.”

A command line of “denper -a host:nnnn -x -n ‘name to match’” may startthe present invention as a socket client and attempt to match a singlename. The results may be written to “stdout”, or redirected to a file byadding a -o option.

In addition to command line options and configuration files, the presentinvention may also receive input from environment variables or othersuch inter-process communications methods. For example, the presentinvention may read the environment variable EMS_DENPER. EMS_DENPER maycontain a port number, when the present invention is used as a socketserver, or a host name and port number when the present invention isused as a socket client.

If EMS_DENPER is not present, the -a command line option may be used forthe same purpose. By way of example, without intending to limit thepresent invention, if the EMS_DENPER environment variable is notpresent, the present invention may be started using a command linesimilar to “denper -a nnnnn -h”, where nnnnn is the socket port to beused. If no socket port is given, the present invention may provide adefault for the socket port such as 20787.

Once started, no explicit action may be necessary to stop the presentinvention. When running as a socket or SAP server, the present inventionmay be stopped by killing it as a process using appropriate systemcommands, such as control-c under Windows NT. Since the presentinvention is not intended to write files when used as a socket or SAPserver, killing the present invention as a process should not cause dataloss. When the present invention is used in batch mode from the commandline, it may terminate after processing an input file, or when an emptymatch data line is received.

The preceding has described the present invention from a userperspective. Within a preferred embodiment, the present invention maystore data in one or more record types. Such record types may include,but are not limited to, Address, BillingData, CountryMatchData,CountryRecord, PartnerKey, PartnerMatchData, PartnerRecord,PlaceNameKey, RestrictedPartyKey, RestrictedPartyMatchData,RestrictedPartyRecord, StateMatchData and StateRecord. Each record typemay be converted from and into multiple formats, including, but notlimited to, display-oriented, comma separated variables, variable lengthbinaries, fixed-length text, and XML records.

Records may be nested within other records, and each component of suchnested record may be converted from one record type to another byrecursively requesting conversion of each nested component. In apreferred embodiment, record conversion procedures may be distributedthroughout the source code to those portions of the source coderequiring such information. This distributed architecture allowsconversion information to not be limited to a single location within thesource code.

From a high level perspective, a preferred embodiment may implement adata flow similar to that outlined below:

A request may be received by a socket or other network interface method,and may be decoded, parsed, or otherwise processed. Components of thepresent invention or other, external applications, such as those tied toaccounting, product registration, or other tools, may be queried beforea request is allowed.

A new RestrictedPartyMatchData record may be created and may load itselffrom a match request.

RestrictedPartyMatcher may be asked for notifications triggered byRestrictedPartyMatchData and/or for all RestrictedPartyRecords matchingRestrictedPartyMatchData.

Each matched RestrictedPartyRecord may be converted into a networkinterface-appropriate form, and a RestrictedPartyMatcher response isreturned, via a socket interface, to a requester.

Before describing the process flow triggered by aRestrictedPartyMatchData or RestrictedPartyRecord record when requestingnotifications from a RestrictedPartyMatcher, a brief description of somecore concepts will be undertaken. The first concept is that of a Mapper.A Mapper is an object which converts another object into an array of oneor more strings. By way of an example, without intending to limit thepresent invention, a Mapper may be used in a preferred embodiment toextract match data and record features, tokenize strings, and extractstring indices.

When an inspection or match request is made, match data may be passed toan appropriate Mapper, which can return an array of strings representingmatch data features. Each string returned by a Mapper may have a meaningassociated with it by a Mapper. When an attempt is made to match againsta particular record, a record may be passed to an appropriate Mapperwhich returns an array of strings representing record features.

The number of strings and meaning of each string is identical to astring array returned for corresponding match data. In a preferredembodiment, the present invention may implement a plurality of uniquefeature Mappers, including: AddressFeatures, CountryMatchDataFeatures,CountryRecordFeatures, PartnerMatchDataFeatures, PartnerRecordFeatures,RestrictedPartyMatchDataFeatures, RestrictedPartyRecordFeatures,StateMatchDataFeatures and StateRecordFeatures.

Also used within the present invention are Tokenizers, which are similarto Mappers. Like a Mapper, a Tokenizer may convert a string into anarray of strings. Once a Mapper has extracted a feature from a record,that feature can be tokenized by passing it to an appropriate Tokenizer,which returns an array of strings representing the feature. In apreferred embodiment, inspections and matches may be performed againsttokenized features. A plurality of Tokenizers may be implemented withinthe present invention, including: AddressTokerrizer, EmailTokenizer,FaxTokenizer, NameTokenizer, PhoneTokenizer, PlaceNameTokenizer,PostalTokenizer and WebTokenizer.

Another core concept implemented in the present invention is aClassifier. A Classifier, like a Tokenizer, can convert a string into anarray of strings. However, while a Tokenizer extracts “tokens” or“words” from a string, a Classifier extracts indices from a string, suchstrings typically representing a token. A plurality of Classifiers maybe implemented in a preferred embodiment, including:AlphabeticNgramClassifier, ConsonantNgramClassifier, FdiNgramClassifier,FmlNgramClassifier, MetaphoneClassifier, NumericNgramClassifier,PhonexClassifrer and SoundexClassifier.

Another core concept to the present invention is that of a Matcher. AMatcher is an object which, when presented with an object, may return aset of notification strings or a set of matched objects. There are twoprimary types of Matchers, record Matchers and feature Matchers. RecordMatchers are passed match data and may return a set of notificationstrings or a set of matched records. Feature Matchers are passed featurestrings (extracted from match data or records using feature Mappers) andmay return notification strings or sets of matched feature strings.

A plurality of record Matchers may be implemented in a preferredembodiment, including: CountryMatcher, PartnerMatcher,RestrictedPartyMatcher and StateMatcher. In addition, a plurality offeature Matchers may also be implemented in a preferred embodiment,including: AddressMatcher, EmailMatcher, FaxMatcher, NameMatcher,PhoneMatcher, PlaceNameMatcher, PostalMatcher and WebMatcher.

A preferred embodiment of the present invention may also includeadditional Matcher types, such as SmarteClientMatcher andSocketClientMatcher. Such Matchers present an interface similar to thatof a record Matcher, but can communicate a request across a socket to aserver, such server directing request to an appropriate Matcherfunctioning within that server. These Matchers may be templates whichhave been parameterized to allow their use without knowledge of anactual Matcher servicing match requests, or even knowledge of match dataor records. This is an example of one of the many places that theability of a record to convert itself between different formats comesinto play.

With the previous definitions in mind, the manner in which aRestrictedPartyMatcher functions can be discussed. When either aninspection request or a match request is made, aRestrictedPartyMatchData record may be passed to aRestrictedPartyMatchDataFeatures instance, which may extract featuresfrom match data. In a presently preferred embodiment, such extractedfeatures may include: name, address, phone, fax, email and web.

An AddressFeatures instance may be used to extract address featuressince an address is a nested component of a RestrictedPartyMatchDatarecord. Once the features have been extracted, AddressMatcher,NameMatcher, PhoneMatcher, FaxMatcher, EmailMatcher and WebMatcherinstances can be called to satisfy an inspection or match request.Results from individual matchers may be combined into a single resultset. For match requests, results may be “filtered” to removeRestrictedPartyRecord records not satisfying the filters. Examples ofsuch filters include, but are not limited to, date and issuing country.

Each of these steps can be influenced by various options in aninitialization file, as described above. For example, an initializationfile may indicate which indices are extracted, how such features areextracted, inspection or matching techniques to be used, and whichfilters should be applied to match results.

Through the system and method described above, the present invention mayfacilitate linguistic pattern matching by providing new means for stringcomparison. The present invention further adds the ability to comparenon-Arabic strings, and the present invention allows such comparisons tobe performed on a distributed basis.

While the preferred embodiment and various alternative embodiments ofthe invention have been disclosed and described in detail herein, it maybe apparent to those skilled in the art that various changes in form anddetail may be made therein without departing from the spirit and scopethereof.

1. A method for comparing a query against data contained within adatabase comprising the steps of: (a) receiving said query; (b)extracting a plurality of attributes from a plurality of potential matchareas from said query; (c) converting said plurality of attributes fromsaid query, using at least one linguistic pattern matching analyticaltool, into a plurality of linguistic pattern strings; (d) comparing,using at least one user selectable index property, said plurality oflinguistic pattern strings with at least one stored linguistic patternstring from at least one stored attribute contained within said databasefor providing a set of matches; (e) analyzing said set of matches, usingsaid at least one linguistic pattern matching analytical tool, toprovide at least one set of matched attributes; (f) combining all ofsaid at least one set of matched attributes to provide a combinedresult; and (g) wherein at least one of the actions of receiving,extracting, converting, comparing, analyzing, and combining isimplemented using at least one data processing system.
 2. The method ofclaim 1, wherein said matches are name matches.
 3. The method of claim1, further comprising determining whether the initials of said namematches match.
 4. The method of claim 1, wherein said plurality ofpotential match areas are user selectable.
 5. The method of claim 4,wherein said plurality of potential match areas are name, address,telephone number, facsimile number, e-mail address, and date of birth.6. The method of claim 4, wherein said at least one linguistic patternmatching analytical tool used for converting has characteristics atleast some of which are user selectable.
 7. The method of claim 6,wherein said comparing is by edit distance.
 8. The method of claim 6,further comprising the step of filtering said combined result accordingto at least one user selectable criteria.
 9. The method of claim 6,further comprising the step of employing a Metaphone based analysis, aPhonex based analysis, a Soundex based analysis, an Alphabetic N-grambased analysis, a Consonant N-gram based analysis, a Numeric N-grambased analysis, an Fdi N-gram based analysis, an Fml N-gram basedanalysis, an edit-distance based analysis and a dictionaries basedanalysis.
 10. The method of claim 6, further comprising designating,responsive to a match candidate containing an unusual word in an unusualwords dictionary, said match candidate to be a match.
 11. The method ofclaim 6, wherein: said query includes a party's name; and said databaseincludes names of parties restricted from receiving certain goods. 12.The method of claim 6, further comprising transmitting said querythrough the Internet.
 13. The method of claim 11, further including thestep of filtering said combined result according to at least one userselectable criteria.
 14. The method of claim 13, further comprising thestep of employing at least one of a Metaphone based analysis, a Phonexbased analysis, a Soundex based analysis, an N-gram based analysis, anedit-distance based analysis and a dictionaries based analysis.
 15. Asystem for comparing a query against data contained within at least onedatabase comprising: (a) a central processing unit having at least oneelectronic communications port for receiving said query, wherein saidcentral processing unit is attached to said at least one database; (b)at least one extraction tool accessible to said central processing unitfor extracting a plurality of attributes from a plurality of userselectable match areas from said query; (c) at least one linguisticpattern analytical tool having characteristics at least some of whichare user selectable and being accessible to said central processing unitfor converting said plurality of attributes from said query into aplurality of linguistic pattern strings, and for comparing saidplurality of linguistic pattern strings with at least one storedlinguistic pattern string contained within at least one of said databasefor providing a set of matches; (d) said at least one linguistic patternanalytical tool accessible to said central processing unit for analyzingsaid set of matches to provide at least one set of matched attributes;and (e) at least one combining tool accessible to said centralprocessing unit for combining all of said at least one set of matchedattributes to provide a combined result.
 16. The system of claim 15,further comprising at least one filtering tool accessible to saidcentral processing unit for filtering said combined result according toat least one user selectable criteria.
 17. The system of claim 15,wherein said at least one linguistic pattern analytical tool iscomprised of at least one of a Metaphone based analysis, a Phonex basedanalysis, a Soundex based analysis, an N-gram based analysis, anedit-distance based analysis and a dictionaries based analysis.
 18. Acomputer-implemented method for comparing a query against data containedwithin a database comprising the steps of: (a) receiving said query; (b)extracting a plurality of attributes from a plurality of user selectablematch areas from said query; (c) converting said plurality ofattributes, using a Metaphone based linguistic pattern analytical tool,into a plurality of Metaphone linguistic pattern strings; (d) comparing,using at least one user selectable index property, at least one of saidplurality of Metaphone linguistic pattern strings with said at least onestored linguistic pattern string contained within said database toprovide a plurality of Metaphone matches; (e) converting said pluralityof attributes, using a Phonex based linguistic pattern analytical tool,into a plurality of Phonex linguistic pattern strings; (f) comparing,using at least one user selectable index property, at least one of saidplurality of Phonex linguistic pattern strings with said at least onestored linguistic pattern string contained within said database toprovide a plurality of Phonex matches; (g) converting said plurality ofattributes, using a Soundex based linguistic pattern analytical tool,into a plurality of Soundex linguistic pattern strings; (h) comparing,using at least one user selectable index property, at least one of saidplurality of Soundex linguistic pattern strings with said at least onestored linguistic pattern string contained within said database toprovide a plurality of Soundex matches; (i) converting said plurality ofattributes, using an N-gram based linguistic pattern analytical tool,into a plurality of N-gram linguistic pattern strings; (j) comparing,using at least one user selectable index property, at least one of saidplurality of N-gram linguistic pattern strings with at least one storedlinguistic pattern string contained within said database to provide aplurality of N-gram matches; (k) combining said plurality of Metaphonematches, said plurality of Phonex Matches, said plurality of Soundexmatches, and said plurality of N-gram matches to form a set of combinedmatches; (l) analyzing said set of matches using said Metaphone basedlinguistic pattern analytical tool, Phonex based linguistic patternanalytical tool, said Soundex based linguistic pattern analytical tool,an edit-distance based linguistic pattern analytical tool, and adictionaries based linguistic pattern analytical tool to provide atleast one set of matched attributes; (m) combining said at least one setof matched attributes to provide a combined result; and (l) wherein atleast one of the actions of (a) through (m) above is implemented usingat least one data processing system.